feat: add VolumeAttachment wait after drain using cached client by Trojan295 · Pull Request #207 · castai/cluster-controller

Trojan295 · 2026-01-08T16:09:24Z

Adds optional feature to wait for VolumeAttachments to be deleted after node drain, preventing CSI Multi-Attach errors.

Key Changes:

New VolumeDetachmentWaiter - waits for VAs to be deleted using informer cache with node name index
Per-action control via API - waitForVolumeDetach (bool) and volumeDetachTimeoutSeconds (int) fields on drain action
Uses informer manager and adds VolumeAttachment informer with index on spec.nodeName

New Environment Variables:

Variable	Default	Description
DRAIN_DISABLE_VOLUME_DETACH_WAIT	false	Forcefully disables VA wait
DRAIN_VOLUME_DETACH_TIMEOUT	60s	Default VA wait timeout
INFORMER_CACHE_SYNC_TIMEOUT	1m	Informer sync timeout

Things to Remember

Timeout is non-fatal - logs warning and proceeds with node deletion
Excludes DaemonSet/static pod VAs - avoids deadlock waiting for non-evictable pods
API controls feature, but it can be force disabled via env var

internal/actions/drain_node_handler.go

furkhat · 2026-01-14T11:40:35Z

README.md

+| `API_KEY` | CAST AI API key (required) | - |
+| `API_URL` | CAST AI API URL (required) | - |
+| `CLUSTER_ID` | CAST AI cluster ID (required) | - |
+| `DRAIN_VOLUME_DETACH_TIMEOUT` | Default timeout for waiting for VolumeAttachments to detach during node drain | `60s` |


Is this a good idea? Feel free to ignore if it's not.

Suggested change

| `DRAIN_VOLUME_DETACH_TIMEOUT` | Default timeout for waiting for VolumeAttachments to detach during node drain | `60s` |

| `FALLBACK_DRAIN_VOLUME_DETACH_TIMEOUT` | Default timeout for waiting for VolumeAttachments to detach during node drain | `60s` |

Hmm... how about DRAIN_VOLUME_DETACH_DEFAULT_TIMEOUT? So it's clearer to the user, that it's a default value, but it could be overridden by something (API payload in that case).

furkhat · 2026-01-14T11:46:05Z

internal/actions/actions.go

+	// May be nil if informer sync failed.
+	VAWaiter VolumeDetachmentWaiter


Do you think the config should instead contain only static configuration and handler objects should be passed separately?

Yeah, makes sense, given that Kubernetes clients are passed seperatelly already.

furkhat · 2026-01-14T11:49:26Z

internal/actions/drain_node_handler_test.go

+		r.Equal(60*time.Second, h.getVolumeDetachTimeout(req))
+	})
+
+	t.Run("returns per-action timeout when set", func(t *testing.T) {


This tests seem repetitive, could they be rewritten as table tests?

internal/volume/detachment_waiter_test.go

cmd/controller/run.go

internal/informer/manager.go

internal/actions/volume_detachment_waiter.go

internal/config/config.go

cmd/controller/run.go

Tsonov

Overall LGTM, mostly polishing questions

Add optional feature to wait for VolumeAttachments to be deleted after draining a node, preventing Multi-Attach errors when CSI drivers need time to clean up volumes before the node is terminated. Uses controller-runtime's cached client with field indexes for efficient VolumeAttachment and Pod queries without hitting the API server directly. Follows Karpenter's approach of excluding VAs from non-drainable pods (DaemonSets, static pods) to avoid deadlocks. Key changes: - Add Drain config section with WaitForVolumeDetach, VolumeDetachTimeout, and CacheSyncTimeout settings - Create controller-runtime cache with field indexes for VolumeAttachment and Pod resources by spec.nodeName - Implement getVolumeAttachmentsForNode() using Karpenter-style filtering - Graceful degradation: if cache sync fails, feature is disabled but drain operations continue normally Environment variables: - DRAIN_WAIT_FOR_VOLUME_DETACH: Enable feature (default: false) - DRAIN_VOLUME_DETACH_TIMEOUT: Max wait time (default: 60s) - CACHE_SYNC_TIMEOUT: Cache sync timeout (default: 120s)

…tach The spec.nodeName field selector is not supported by the Kubernetes API server for VolumeAttachment resources. This caused the waitForVolumeDetach function to fail silently. Switch to using the controller-runtime cached client which has a custom field indexer configured for spec.nodeName lookups on VolumeAttachments.

…onfig - Fix typos: EnablePodInfomer -> EnablePodInformer, EnableNodeInfomer -> EnableNodeInformer - Add nil checks in addIndexers and reportCacheSize for disabled informers - Fix Start() to sync node/pod informers independently - Fix VA getter logic to check vaAvailable flag instead of nil - Add DisableWaitForVolumeDetach config option - Update run.go to use new informer options - Update tests to enable informers explicitly

…table-driven tests - Move VolumeDetachmentWaiter from internal/actions to internal/volume package - Rename to DetachmentWaiter for cleaner API within volume package - Update drain handler and config to use new package location - Convert TestShouldWaitForVolumeDetach and TestGetVolumeDetachTimeout to table-driven tests - Rename config field VolumeDetachTimeout to VolumeDetachDefaultTimeout for clarity

Add explicit SelfSubjectAccessReview check before starting the VA informer to verify get/list/watch permissions on volumeattachments.storage.k8s.io. Benefits: - Fast feedback (~50ms) vs waiting for 30s sync timeout - Clear error messages indicating which specific permission is missing - Graceful degradation - VA feature disabled if permissions not granted The check uses the standard authorization.k8s.io/v1 SelfSubjectAccessReview API which any authenticated user can call to check their own permissions.

arturaslcast · 2026-01-15T09:45:16Z

internal/actions/drain_node_handler.go

+	log logrus.FieldLogger,
+	clientset kubernetes.Interface,
+	castNamespace string,
+	volumeDetachTimeout time.Duration,


wouldn't it make more sense to move the default timeout to the waiter instead? Then the following logic wouldn't be needed in the waiter constructor:

if timeout == 0 { timeout = DefaultVolumeDetachTimeout }

and this drain handler wouldn't need to know about the default which should be a bit simpler logic

Makes also sense, my reasoning was that we could reuse this somewhere else, where we want to have a different default timeout, but I don't see any objections against a default timeout on the waiter

arturaslcast · 2026-01-15T09:48:44Z

internal/volume/detachment_waiter.go

+	vaIndexer cache.Indexer,
+	pollInterval time.Duration,
+) DetachmentWaiter {
+	if vaIndexer == nil {


the caller already can check for such condition, no need to hide such complexity on lower levels. Constructors should return something useful if were called or return an error if something is missing. That should help to avoid silent failures

arturaslcast · 2026-01-15T09:51:01Z

internal/informer/manager.go

+// Returns (true, nil) if all permissions are available.
+// Returns (false, nil) if any permission is missing (logs which ones).
+// Returns (false, error) if the permission check itself fails.
+func (m *Manager) checkVAPermissions(ctx context.Context) (bool, error) {


wouldn't be enough to just return an error in case of generic error, noPermissions type err or no error when everything is good?

Trojan295 force-pushed the wait-for-volume-attachments-in-drain branch from 31bfc06 to 232802c Compare January 13, 2026 08:58

Trojan295 marked this pull request as ready for review January 13, 2026 15:10

Trojan295 requested a review from a team as a code owner January 13, 2026 15:10